This is a final project of the course: Introduction to Programming and Geostatistics within the Master Program Applied Earth Observation and Geoanalysis at the University of Würzburg with Professors Dr. Martin Wegmann and Dr. Jackob Schwalb-Willmann.
The project consists of several stages and has as a long-term goal the application of ecological niche modeling tools to determine the areas of highest species richness of the Delphinidae family and their interaction with the main fishing grounds and marine protected areas of the Western North Atlantic, Gulf of Mexico and the Caribbean Sea.
In this first phase, we will start with an academic exercise using the Stenella clymene or Clymene dolphin species to define the basis of the algorithm to be used for all species. Clymene dolphins are also known as “short-snouted spinners” because they often spin when they jump out of the water. Although there are no known major conservation problems for this species, it is likely that there are some undocumented problems. Some dolphins are killed in directed fisheries in the Caribbean, and others incidentally in nets throughout most of their range (Jefferson, 2018)
Clymene dolphin.
The Clymene dolphin is found only in the Atlantic Ocean, in tropical to warm-temperate waters. The exact range is not well documented, especially in South Atlantic, Mid-Atlantic and West African waters. Most sightings have been in deep, offshore waters, although Clymene dolphins are sometimes observed very close to shore where deep water approaches the coast (such as around some Caribbean islands). They are present year-round at least in the northern Gulf of Mexico and probably throughout much of their tropical range.
# 1
Preparation of oceanographic variables and presence data.
After this stage, the code for the processing of the 13 most representative species of the family Delphinidae in these regions will be presented. In these first stages we will work only with the GBIF (Global Biodiversity Information Facility) databases, then we will improve the presence databases by relying on information from Cuban institutions and other publications in the region (Barragán-Barrera et al., 2019).
The oceanographic variables used were selected from the MARSPEC (Sbrocco et al., 2013) and Bio-Oracle (Assis et al., 2018) datasets. These two data sets are the most widely used globally for species distribution modeling (Melo-Merino et al., 2020). Please refer to the respective websites for a description of the variables and the methodology used:
MARSPEC: http://www.marspec.org/ Bio-ORACLE: https://www.bio-oracle.org/index.php
It is possible to access these layers in many ways using APIs from within the R environment. One of the most widely used is the “sdmpredictors” package from Samuel Bosch (https://github.com/lifewatch/sdmpredictors)_. For the purposes of this work and to save time in the reproduction of the code, all the previous information was compiled by preprocessing each layer and preparing it for later use by the models. In this link (https://mega.nz/folder/QNJAwBqA#rYw4nxo4wa8KJ3CEgbVy4Q)_ you can download the layers already trimmed and resampled for the study area. The resampling codes can be found in the project folder.
It was necessary to investigate the availability of presence data in the GBIF database. For this purpose, the “dismo” package of Hijmans and collaborators _(https://rspatial.org/raster/sdm)_ was mainly used.
Exploration of the availability of points of presence and species selection:
# Load required packages
library(dismo)
library(ggplot2)
# Path to the enviromental variables
MARSPECT_path <- "D:/0_My_Analysis/Dolphin_distribution/Dolphin/MARSPECT_Resample"
BioOracle_path <- "D:/0_My_Analysis/Dolphin_distribution/Dolphin/BIO_Cliped"
# List all .tif and .tiff files in the folder
BioOracle_files <- list.files(path = BioOracle_path, pattern = "\\.(tif|tiff)$", full.names = TRUE, ignore.case = TRUE)
MARSPECT_files <- list.files(path = MARSPECT_path, pattern = "\\.(tif|tiff)$", full.names = TRUE, ignore.case = TRUE)
# Read the reference MARSPECT raster
reference_marspect_raster <- raster(MARSPECT_files[1])
# Read shapefile ROI
shp_ROI <- shapefile("D:/0_My_Analysis/Dolphin_distribution/Dolphin/Vector_Layers/Study_Area.shp")
shp_ROI_transformed <- spTransform(shp_ROI, proj4string(reference_marspect_raster))
# Define the species list with genus and species names
species_list <- list(
list(genus = "Lagenorhynchus", species = "albirostris"),
list(genus = "Feresa", species = "attenuata"),
list(genus = "Peponocephala", species = "electra"),
list(genus = "Globicephala", species = "macrorhynchus"),
list(genus = "Globicephala", species = "melas"),
list(genus = "Pseudorca", species = "crassidens"),
list(genus = "Grampus", species = "griseus"),
list(genus = "Orcinus", species = "orca"),
list(genus = "Steno", species = "bredanensis"),
list(genus = "Stenella", species = "attenuata"),
list(genus = "Tursiops", species = "truncatus"),
list(genus = "Stenella", species = "coeruleoalba"),
list(genus = "Phocoena", species = "phocoena")
)
# Initialize an empty vector to store the record counts
record_counts <- c()
# Loop through the species list and download records for each species
for (species_info in species_list) {
species_data <- gbif(
genus = species_info$genus,
species = species_info$species,
ext = shp_ROI_transformed,
download = FALSE,
geo = TRUE,
removeZeros = TRUE,
sp = FALSE
)
record_counts <- c(record_counts, print(species_data))
}## [1] 103
## [1] 225
## [1] 189
## [1] 1655
## [1] 1632
## [1] 378
## [1] 4380
## [1] 399
## [1] 639
## [1] 5575
## [1] 64664
## [1] 2128
## [1] 5967
# Combine species names and record counts
species_records <- data.frame(Species = sapply(species_list, function(x) paste(x$genus, x$species)), RecordCount = record_counts)
# Order the species by the number of records
species_records <- species_records[order(species_records$RecordCount, decreasing = TRUE), ]
# Create a bar plot
ggplot(species_records, aes(x = reorder(Species, -RecordCount), y = RecordCount, fill = Species)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = sample(colors(), length(species_list))) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Number of records available for each of the most representative dolphin species",
x = "Species",
y = "Number of records downloaded") +
guides(fill = FALSE)After downloading the pre-processed variables together with the study area, we begin to prepare the working environment of our project in R Studio. All the libraries are necessary, but there are three that are unique to this type of analysis: dismo, usdm and rJava. The dismo package provides functions to model species distribution using various algorithms, such as Maxent, Random Forest, and Boosted Regression Trees. It is essential for species distribution modeling because it enables the implementation and evaluation of different models.The usdm package (Uncertainty Analysis for Species Distribution Models) provides functions to quantify and analyze the uncertainty in species distribution models. It is crucial for species distribution modeling because it helps assess the reliability and accuracy of model predictions.The rJava package provides a low-level bridge between R and Java, allowing R to use Java libraries and functions. Although not directly related to species distribution modeling, it is essential when using certain functions or packages that depend on Java, such as the Maxent algorithm in the dismo package.
The line options(java.parameters = “-Xmx12g”) sets the maximum amount of memory that the Java Virtual Machine (JVM) can use within the R session. In this case, it is set to 12 gigabytes (12g).Keep in mind that the value you set for the maximum memory allocation should be based on the available memory on your system. Allocating too much memory to the JVM could cause your system to become unresponsive or crash. Be cautious when adjusting this value and ensure that your system has enough memory to accommodate your needs.
The next part of the script is to load and process environmental data from two sources, MARSPECT and Bio-Oracle, along with a region of interest (ROI) shapefile. It sets the paths to the relevant folders, lists all the .tif and .tiff files, reads and stacks the rasters, and combines them into a single raster stack called ‘Variables’. The script also prints the names of the layers in the stack and reads the ROI shapefile, transforming its projection to match the reference MARSPECT raster. The output is a combined raster stack with environmental data and a transformed shapefile representing the study area.
# List of required packages
required_packages <- c("raster", "sf", "sp", "dismo", "maptools", "usdm", "rJava", "mapview")
# Install and load packages
for (pkg in required_packages) {
if (!requireNamespace(pkg, quietly = TRUE)) {
install.packages(pkg)
}
library(pkg, character.only = TRUE)
}
# Load necessary libraries
library(raster)
library(sf)
library(sp)
library(dismo)
library(maptools)
library(usdm)
library(rJava)
library(mapview)
# Set Java memory limit
options(java.parameters = "-Xmx12g")
# Set the working directory
setwd("D:/0_My_Analysis/Dolphin_distribution/Dolphin")
# Set paths to the environmental variables
MARSPECT_path <- "D:/0_My_Analysis/Dolphin_distribution/Dolphin/MARSPECT_Resample"
BioOracle_path <- "D:/0_My_Analysis/Dolphin_distribution/Dolphin/BIO_Cliped"
# List all .tif and .tiff files in the folders
BioOracle_files <- list.files(path = BioOracle_path, pattern = "\\.(tif|tiff)$", full.names = TRUE, ignore.case = TRUE)
MARSPECT_files <- list.files(path = MARSPECT_path, pattern = "\\.(tif|tiff)$", full.names = TRUE, ignore.case = TRUE)
# Read and stack rasters
reference_marspect_raster <- raster(MARSPECT_files[1])
bio_rasters <- stack(lapply(BioOracle_files, raster))
marspect_rasters <- stack(lapply(MARSPECT_files, raster))
Variables <- stack(bio_rasters, marspect_rasters)
# Print the image collection
names(Variables)## [1] "CalciteMean" "ChrolophylMean"
## [3] "CurrentVelocityMax" "CurrentVelocityMean"
## [5] "CurrentVelocityMin" "DiffuseAttenuationMean"
## [7] "DissolvedOxygenMean" "IronMean"
## [9] "NitrateMean" "ParMean_BOv22"
## [11] "pH_BOv22" "PhosphateMean"
## [13] "PhytoPlanktonMean" "PrimaryProductivityMean"
## [15] "b1_EastWestAspect" "b10_MaximumMonthlySSS"
## [17] "b11_AnnualRangeSSS" "b12_AnnualVarianceSSS"
## [19] "b13_MeanAnnualSST" "b14_SSTcoldestIceFreeMonth"
## [21] "b15_SSTwarmestIceFreeMonth" "b16_AnnualRangeSST"
## [23] "b17_AnnualVarianceSST" "b2_NorthSouthAspect"
## [25] "b3_PlanCurvature" "b4_ProfileCurvature"
## [27] "b5_DistancetoShore" "b6_BathymetricSlope"
## [29] "b7_Concavity" "b8_MeanAnnualSSS"
## [31] "b9_MinimumMonthlySSS" "bathymetry_30s"
# Read and transform the ROI shapefile
shp_ROI <- shapefile("D:/0_My_Analysis/Dolphin_distribution/Dolphin/Vector_Layers/Study_Area.shp")
shp_ROI_transformed <- spTransform(shp_ROI, proj4string(reference_marspect_raster))This code block is focused on obtaining and preparing species presence data for the Stenella clymene (Clymene dolphin) from the Global Biodiversity Information Facility (GBIF) database and preparing it for Maxent modeling.
First, it downloads the species presence data using the gbif() function with parameters specifying the genus, species, and desired output format. The data is limited to a specific geographic extent defined by shp_ROI_transformed, and only records with georeferences are downloaded. The code then provides an overview of the data’s dimensions and column names and visualizes the presence points on a world map.
Next, the data is cleaned by identifying empty records and converting it to a SpatialPointsDataFrame with a defined Coordinate Reference System (CRS). To eliminate sample bias, the gridSample() function is used, which samples the presence points based on a raster grid with a specified resolution. The presence data is then combined with a set of random background points, generated using the randomPoints() function, and converted to SpatialPointsDataFrame objects.
Finally, the data is divided into training and testing sets using the k-fold method with a random seed set to 0 to ensure reproducibility. The presence and background points are split separately into training and testing datasets, which will be used for modeling and evaluation in the Maxent modeling process.
# Obtain species presence data from GBIF database
Stenella_clymene <- gbif(genus = "Stenella", #the genus name
species = "clymene", #the species name
ext = shp_ROI_transformed, #object to limit the geographic extent of the records
download=TRUE, #whether to download the records or just show the number
geo=TRUE, # whether to only download records that have a georeference
sp=FALSE) # whether to return a SpatialPointsDataFrame
# Explore the data
dim(Stenella_clymene)## [1] 268 135
# Visualize the presence points on a map
data(wrld_simpl)
plot(wrld_simpl, xlim=c(-100,-50), ylim=c(8,50),axes=TRUE, col="light yellow")
box() # restore the box around the map
points(Stenella_clymene$lon, Stenella_clymene$lat, col='red', cex=0.75) # add the points# Data cleaning
# Seach empty records
lonzero = subset(Stenella_clymene, lon==0)
lonzero[, 1:13]# Convert to SpatialPolygonsDataFrame and set CRS
coordinates(Stenella_clymene) <- ~lon+lat
crs(Stenella_clymene) <- crs(projection(reference_marspect_raster))
# Eliminate sample bias
e <- extent(Stenella_clymene)
r <- raster(e) # create a RasterLayer with the same extent
res(r) <- 0.25 # set the resolution of the cells to (for example) 1 degree
r <- extend(r, extent(r)+1)
Stenella_clymene_fix <- gridSample(Stenella_clymene, r, n=1) # sample
# Prepare data for Maxent
bg_points <- randomPoints(mask = Variables, #a Raster* object
n = 300, #Number of points
ext = extent(Stenella_clymene_fix),
p = Stenella_clymene_fix, #Presence points
excludep = TRUE) #If TRUE, presence points are excluded from background
# Convert to SpatialPointsDataFrame objects
bg_points <- SpatialPointsDataFrame(coords = bg_points, data = data.frame(id = 1:nrow(bg_points)), proj4string = CRS(proj4string(Variables)))
presence_points <- SpatialPointsDataFrame(coords = Stenella_clymene_fix, data = data.frame(id = 1:nrow(Stenella_clymene_fix)), proj4string = CRS(proj4string(Variables)))
# Split data into training and testing sets using k-fold
set.seed(0)
groupP <- kfold(presence_points, 5)
pres_train <- presence_points[groupP != 1, ]
pres_test <- presence_points[groupP == 1, ]
groupB <- kfold(bg_points, 5)
bg_train <- bg_points[groupB != 1, ]
bg_test <- bg_points[groupB == 1, ]Now we are need to exclude variables with high correlation from the environmental variables dataset to improve the performance of species distribution models.
First, the extract() function is used to obtain the values of environmental variables at the locations of the presence points in the training dataset. The method is set to “bilinear” for interpolation, the small parameter is set to FALSE, the summary function is set to mean, and missing values are removed with na.rm set to TRUE.
Next, the vifstep() function is applied to the extracted values of environmental variables. The Variance Inflation Factor (VIF) is a measure of multicollinearity among predictor variables, and vifstep() is a stepwise variable selection method based on VIF. It identifies and removes variables with high multicollinearity, as they can lead to unstable parameter estimates in the model.
Finally, the exclude() function is used to create a new dataset called Variables_Good that contains only the environmental variables with acceptable levels of multicollinearity. This dataset will be used for building the species distribution model without the influence of highly correlated variables.
# Exclude variables with high correlation
pres_train_value <- extract(Variables, pres_train,
method = "bilinear", #can be set to 'simple' or 'bilinear'
small = FALSE, #If the function always returns a number, even when the buffer does not include the center of a single cell.
fun = mean, #specify a function to summarize the extracted values. max or mean
na.rm = TRUE) #whether to remove missing values from the data before applying the summary function (fun) to the extracted values.
vifstep_var <- vifstep(pres_train_value)
Variables_Good <- exclude(Variables, vifstep_var)
names(Variables_Good)## [1] "CalciteMean" "CurrentVelocityMax"
## [3] "CurrentVelocityMin" "NitrateMean"
## [5] "ParMean_BOv22" "pH_BOv22"
## [7] "PhosphateMean" "PrimaryProductivityMean"
## [9] "b1_EastWestAspect" "b10_MaximumMonthlySSS"
## [11] "b12_AnnualVarianceSSS" "b15_SSTwarmestIceFreeMonth"
## [13] "b2_NorthSouthAspect" "b3_PlanCurvature"
## [15] "b5_DistancetoShore" "b6_BathymetricSlope"
## [17] "b7_Concavity" "bathymetry_30s"
An last we run a **Maxent *model**, evaluating its performance, and visualizing the results using the refined environmental variables and presence points data. MaxEnt (short for “Maximum Entropy”; Phillips et al., 2006) is the most widely used SDM algorithm. Elith et al. (2010) provide an explanation of the algorithm (and software) geared towards ecologists. MaxEnt is available as a stand-alone Java program. Dismo has a function ‘maxent’ that communicates with this program.
In this part we start with a custom function to extract environmental variables values within a buffer around each presence point. It returns the mean and standard deviation of the values within the buffer.Then the main Maxent model is created using the maxent() function with the refined environmental variables and training presence and background points. The custom my_extractor function is passed as an argument for extracting variables.The Model results visualization through plot(maxent_model) function creates a variable contribution plot, while response(maxent_model) generates a response plot.The evaluate() function assesses the model performance using the testing presence and background points. The results are stored in evaluate_Max.
Next, The predict() function generates a prediction map for species distribution using the trained Maxent model and refined environmental variables.A threshold is calculated using the ‘spec_sens’ method, and a presence/absence map is created by comparing the prediction values to this threshold.And last to Visualize the presence/absence map, mapview() function is used again to display the binary presence/absence map of the species distribution.
The key factors influencing the presence of the Stenella clymene species are as follows:
The predictive model for this species has an Area Under the Curve (AUC) value of 0.9265152, indicating a high degree of accuracy in its predictions. However, it is important to continuously calibrate the model because each iteration may slightly alter its accuracy and the projected area of the species’ presence. This variability primarily stems from the model’s reliance on pseudo-absences, which are generated randomly in our analysis.
Based on the variables identified, we can propose the following hypotheses about the factors influencing the presence of this dolphin species:
Nitrate concentration (average value): Nitrate levels might influence the presence of phytoplankton, which serves as the base of the marine food chain. Higher nitrate concentrations could support larger populations of prey species for the Clymene dolphin, making these areas more attractive for foraging.
Surface salinity (maximum value per month): The monthly maximum surface salinity could be a proxy for the dolphin’s preferred habitat conditions. It might indicate areas with lower freshwater input from rivers, which may provide more stable environments for the species.
Bathymetry (depth in meters): The depth of the water might be an important factor because it can determine the types of prey available and influence the dolphin’s foraging behavior. Clymene dolphins are known to inhabit both shallow and deep waters, but their distribution might be influenced by the depth-dependent availability of prey species.
Dissolved oxygen (average value): Dissolved oxygen is crucial for the survival of marine organisms, including the prey species of the Clymene dolphin. Areas with higher dissolved oxygen concentrations might support more abundant and diverse prey populations, making them more attractive to the dolphins.
Distance to the coast (in kilometers): Proximity to the coast might affect the presence of the Clymene dolphin by influencing the abundance and distribution of prey species. Coastal areas can have high levels of primary productivity, which in turn supports a diverse and abundant prey base for the dolphins.
Primary productivity (average value): Primary productivity is a measure of the rate at which plants and algae produce organic matter through photosynthesis. Higher primary productivity levels can lead to greater availability of food resources for the marine food chain, including the prey species consumed by Clymene dolphins. As a result, the dolphins might be more likely to inhabit areas with higher primary productivity.
These hypotheses can help guide further research into the ecological factors influencing the distribution and habitat preferences of the Clymene dolphin. However, it is important to note that these hypotheses are based on correlations and should be tested with more in-depth studies to establish causal relationships.
# Function to extract variables in a larger area
my_extractor <- function(raster_layer, points) {
extract(raster_layer, buffer(points, width=3)) %>%
apply(2, function(x) c(mean(x), sd(x)))
}
#Run Maxent model
maxent_model <- maxent(x=Variables_Good,
p=pres_train,
a=bg_train,
factors=NULL,
removeDuplicates=TRUE,
extract=my_extractor,
args=c("-P", "autorun", "nowarnings"),
path="D:/0_My_Analysis/Dolphin_distribution/Dolphin/Maxent_model")
# Plot Maxent model results
plot(maxent_model) # Variable Contribution# Evaluate the model with test points
evaluate_Max <- evaluate(pres_test, bg_test, maxent_model, Variables_Good)
evaluate_Max## class : ModelEvaluation
## n presences : 22
## n absences : 60
## AUC : 0.9242424
## cor : 0.710444
## max TPR+TNR at : 0.2903666
# Make predictions using the model
Stenella_clymene_Prediction <- predict(maxent_model, Variables_Good, ext=extent(reference_marspect_raster))
# Visualize Maxent raw values
mapview(Stenella_clymene_Prediction, zcol = "value",
map.types = "OpenStreetMap",
legend = TRUE,
alpha = 0.5,
col.regions = viridis::viridis,
main = "Maxent, raw values")# Apply the threshold and create a presence/absence map
tr <- threshold(evaluate_Max, 'spec_sens')
presence_absence <- Stenella_clymene_Prediction > tr
mapview(presence_absence, zcol = "value",
map.types = "OpenStreetMap",
legend = TRUE,
alpha = 0.3,
col.regions = c("white", "red"),
main = "presence/absence")# Generate 'response plots'
.jinit() # initializes the Java Virtual Machine in R## [1] 0
response(maxent_model) # A response plot
# 6 Analysis of the result with Protected Areas and Exclusive Economic
Zone
The last part of the analysis we reads in two shapefiles representing Exclusive Economic Zone and International Hydrographic Organization (EEZ-IHO) areas and Protected Areas (WDPA) and transforms them to match the projection of a raster file called “presence_absence”. It then extracts the presence-absence values for each polygon in both shapefiles, calculates the count of presence pixels within each polygon, and adds the presence count as a new column to each shapefile. The code then calculates the sum of Presence_Count for each unique value in the EEZ and ISO3 columns of the EEZ-IHO and WDPA shapefiles, respectively, and prints the results in descending order. This code is likely being used to analyze the distribution of dolphins in relation to protected areas and EEZ-IHO areas.
In a preliminary analysis of the governance of the potential habitat of the species, we realise that the United States, Mexico and Cuba have a predominant influence on the conservation of the species. In addition, the low coverage of the current system of protected areas in this region is evident, without discriminating against areas that are not approved or do not have a management plan.
This preliminary work has a high potential for further analysis of the status of marine dolphin species.
# Compare result with Exclusive Economic Zone and International Hydrographic Organization (EEZ - IHO) areas
shp_EEZIHO <- shapefile("D:/0_My_Analysis/Dolphin_distribution/Dolphin/Vector_Layers/EEZ_IHOv42020.shp")
shp_EEZIHO_transformed <- spTransform(shp_EEZIHO, proj4string(reference_marspect_raster))
# Extract presence-absence values for each polygon
EEZIHO_ext <- extract(presence_absence, shp_EEZIHO_transformed)
# Calculate the count of presence pixels within each polygon
EEZIHO_counts <- sapply(EEZIHO_ext, function(x) sum(x == 1, na.rm = TRUE))
# Add the presence count as a new column to the shapefile
shp_EEZIHO_transformed@data$EEZIHO_counts <- EEZIHO_counts
# Calculate the sum of Presence_Count for each unique value in the IHO_SEA column
EEZIHO_sum <- aggregate(EEZIHO_counts ~ EEZ, data = shp_EEZIHO_transformed, FUN = sum)
EEZIHO_sum <- subset(EEZIHO_sum, EEZIHO_counts > 0)
EEZIHO_sum <- EEZIHO_sum[order(EEZIHO_sum$EEZIHO_counts, decreasing = TRUE), ]
print(EEZIHO_sum)## EEZ EEZIHO_counts
## 52 United States Exclusive Economic Zone 11394
## 32 Mexican Exclusive Economic Zone 1106
## 17 Cuban Exclusive Economic Zone 562
## 50 Trinidad and Tobago Exclusive Economic Zone 313
## 53 Venezuelan Exclusive Economic Zone 299
## 41 Puerto Rican Exclusive Economic Zone 288
## 20 Dominican Republic Exclusive Economic Zone 239
## 40 Panamanian Exclusive Economic Zone 171
## 21 Grenadian Exclusive Economic Zone 162
## 47 Saint Vincent and the Grenadines Exclusive Economic Zone 148
## 4 Bahamas Exclusive Economic Zone 147
## 5 Barbados Exclusive Economic Zone 137
## 11 Colombian Exclusive Economic Zone 124
## 22 Guadeloupean Exclusive Economic Zone 106
## 6 Belizean Exclusive Economic Zone 101
## 25 Haitian Exclusive Economic Zone 97
## 54 Virgin Islander Exclusive Economic Zone 92
## 31 Martinican Exclusive Economic Zone 82
## 2 Antigua and Barbuda Exclusive Economic Zone 71
## 9 British Virgin Islands Exclusive Economic Zone 62
## 46 Saint Lucia Exclusive Economic Zone 51
## 19 Dominican Exclusive Economic Zone 48
## 27 Jamaican Exclusive Economic Zone 45
## 34 Nicaraguan Exclusive Economic Zone 38
## 1 Anguilla Exclusive Economic Zone 32
## 16 Costa Rican Exclusive Economic Zone 25
## 26 Honduran Exclusive Economic Zone 25
## 42 Saba Exclusive Economic Zone 25
## 45 Saint Kitts and Nevis Exclusive Economic Zone 17
## 23 Guatemalan Exclusive Economic Zone 12
## 33 Montserrat Exclusive Economic Zone 12
## 48 Sint-Eustatius Exclusive Economic Zone 7
## 43 Saint-Barthélemy Exclusive Economic Zone 5
## 44 Saint-Martin Exclusive Economic Zone 4
## 51 Turks and Caicos Exclusive Economic Zone 4
## 8 Bonaire Exclusive Economic Zone 3
## 18 Curaçaoan Exclusive Economic Zone 1
## 37 Overlapping claim: Trinidad and Tobago / Venezuela / Guyana 1
## 49 Sint-Maarten Exclusive Economic Zone 1
# Compare result with Protect Area (WDPA) areas
shp_WDPA <- shapefile("D:/0_My_Analysis/Dolphin_distribution/Dolphin/Vector_Layers/WDPA_Apr2023.shp")
shp_WDPA_transformed <- spTransform(shp_WDPA, proj4string(reference_marspect_raster))
# Extract presence-absence values for each polygon
WDPA_values <- extract(presence_absence, shp_WDPA_transformed)
# Calculate the count of presence pixels within each polygon
WDPA_counts <- sapply(WDPA_values, function(x) sum(x == 1, na.rm = TRUE))
# Add the presence count as a new column to the shapefile
shp_WDPA_transformed@data$WDPA_counts <- WDPA_counts
# Calculate the sum of Presence_Count for each unique value in the IHO_SEA column
WDPA_sum <- aggregate(WDPA_counts ~ ISO3, data = shp_WDPA_transformed, FUN = sum)
WDPA_sum <- subset(WDPA_sum, WDPA_counts > 0)
WDPA_sum <- WDPA_sum[order(WDPA_sum$WDPA_counts, decreasing = TRUE), ]
print(WDPA_sum)## ISO3 WDPA_counts
## 35 USA 720
## 13 CUB 219
## 27 MEX 192
## 7 BLZ 76
## 31 PRI 49
## 18 GLP 45
## 38 VGB 45
## 37 VEN 40
## 21 HND 33
## 11 CRI 30
## 28 MTQ 29
## 4 BES 28
## 23 JAM 28
## 34 TTO 26
## 36 VCT 25
## 17 DOM 23
## 19 GRD 23
## 30 PAN 22
## 20 GTM 18
## 25 LCA 17
## 29 NIC 14
## 2 AIA 10
## 24 KNA 10
## 9 BRB 8
## 39 VIR 8
## 3 ATG 7
## 16 DMA 7
## 26 MAF 7
## 10 COL 5
## 6 BLM 4
## 5 BHS 2
## 32 SXM 1
Barragán-Barrera, D. C., do Amaral, K. B., Chávez-Carreño, P. A., Farías-Curtidor, N., Lancheros-Neva, R., Botero-Acosta, N., Bueno, P., Moreno, I. B., Bolaños-Jiménez, J., Bouveret, L., Castelblanco-Martínez, D. N., Luksenburg, J. A., Mellinger, J., Mesa-Gutiérrez, R., de Montgolfier, B., Ramos, E. A., Ridoux, V., & Palacios, D. M. (2019). Ecological Niche Modeling of Three Species of Stenella Dolphins in the Caribbean Basin, With Application to the Seaflower Biosphere Reserve. Frontiers in Marine Science, 6, 10. https://doi.org/10.3389/fmars.2019.00010
Carmezim, J., Pennino, M. G., Martínez-Minaya, J., Conesa, D., & Coll, M. (2022). A mesoscale analysis of relations between fish species richness and environmental and anthropogenic pressures in the Mediterranean Sea. Marine Environmental Research, 180, 105702. https://doi.org/10.1016/j.marenvres.2022.105702
Jefferson, T. A. (2018). Clymene Dolphin. In Encyclopedia of Marine Mammals (pp. 197–200). Elsevier. https://doi.org/10.1016/B978-0-12-804327-1.00093-5
Jefferson, T.A., Webber, M.A., and Pitman, R.L. (2015). “Marine Mammals of the World: A Comprehensive Guide to Their Identification,” 2nd ed. Elsevier, San Diego, CA.
Melo-Merino, S. M., Reyes-Bonilla, H., & Lira-Noriega, A. (2020). Ecological niche models and species distribution models in marine environments: A literature review and spatial analysis of evidence. Ecological Modelling, 415, 108837. https://doi.org/10.1016/j.ecolmodel.2019.108837
Sbrocco, E. J., & Barber, P. H. (2013). MARSPEC: Ocean climate layers for marine spatial ecology: Ecological Archives E094-086. Ecology, 94(4), 979–979. https://doi.org/10.1890/12-1358.1
Assis J, Tyberghein L, Bosch S, Verbruggen H, Serr~ao EA, De Clerck O (2018). Bio-ORACLE v2.0: Extending marine data layers for bioclimatic modelling. Global Ecol Biogeogr. ;27:277–284. https://doi.org/10.1111/geb.12693
Perrin, W. F., Würsig, B., & Thewissen, J. G. M. (Eds.). (2018). Encyclopedia of Marine Mammals (3rd ed.). Academic Press.